RESUMO
We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify "ethnic outlier" subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of "rare" heterozygotes and/or homozygotes whose frequencies are low (<1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.
Assuntos
Cromossomos Humanos , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Genótipo , Polimorfismo de Nucleotídeo Único , População Branca/genética , Algoritmos , Marcadores Genéticos , Genética Populacional , Humanos , Desequilíbrio de Ligação , Modelos GenéticosRESUMO
UNLABELLED: BMD values in approximately 3000 perimenopausal Scottish women were adjusted by regression to identify and account for nongenetic factors. Adjusted BMD values were not associated with simple tandem repeat (STR) markers or single nucleotide polymorphisms (SNPs) at the Cathepsin K (CTSK) locus. We present a thorough analysis of common CTSK polymorphisms and genetic relatedness among CTSK haplotypes. INTRODUCTION: CTSK is a cysteine protease of the papain family and is thought to play a critical role in osteoclast-mediated bone degradation. Rare, inactivating mutations in CTSK cause pychodysostosis, an autosomal recessive osteochondrodysplasia characterized by osteosclerosis and short stature. However, there have been no studies of common genetic variants in CTSK and their possible association with bone density in the general population. MATERIALS AND METHODS: To identify common single nucleotide polymorphisms (SNPs) and simple tandem repeat (STR) polymorphisms in and around CTSK, we screened all CTSK exons, intron A, all intron-exon boundaries, and the putative CTSK promoter region in 130 random whites using both high-performance liquid chromatography (HPLC) and DNA sequencing. CTSK markers were genotyped in approximately 3000 perimenopausal Scottish women whose hip and spine bone mineral density (BMD) had been measured by DXA. We performed linear regression analysis to identify and adjust for nongenetic predictors of BMD, and adjusted BMD values (regression residuals) were tested for association with individual CTSK markers and haplotypes by ANOVA and the composite haplotype method of Zaykin et al. RESULTS AND CONCLUSIONS: We discovered two intronic SNPs (8% and 9% frequency), but no common exonic SNPs (> 1% frequency), and found that three STRs at the immediate 5' end of the CTSK locus are highly polymorphic. The population frequencies of haplotypes defined by these five polymorphisms were estimated, and a cladogram was derived showing proximity of relationship and likely descent of the 30 most common CTSK haplotypes. Regression analyses revealed that approximately 39% of spine and 19% of hip rate of change in BMD was accounted for by nongenetic factors. For baseline BMD values in premenopausal women, nongenetic predictors explained 11% of the variance at the spine and 13% at the hip. Adjusted BMD values showed no statistically significant association with any of the individual CTSK polymorphisms or CTSK haplotypes.